ARROW-5955: [Plasma] Support setting memory quotas per plasma client for better isolation by ericl · Pull Request #4885 · apache/arrow

ericl · 2019-07-16T00:17:16Z

Currently, plasma evicts objects according a global LRU queue. In Ray, this often causes memory-intensive workloads to fail unpredictably, since a client that creates objects at a high rate can evict objects created by clients at lower rates. This is despite the fact that the true working set of both clients may be quite small.

This PR adds

support for configuring per-client LRU eviction queues at runtime
improved debugging of memory usage of clients

Details on the implementation:

QuotaAwarePolicy extends the basic eviction policy to implement per-client memory quotas. This effectively gives each client its own LRU queue, which caps its memory usage and protects this memory from being evicted by other clients.

The quotas are enforced when objects are first created, by evicting the necessary number of objects from the client's own LRU queue to cap its memory usage. Once that is done, allocation is handled by the normal eviction policy. This may result in the eviction of objects from the global LRU queue, if not enough memory can be allocated even after the evictions from the client's own LRU queue.

Some special cases:

When a pinned object is "evicted" from a per-client queue, it is instead transferred into the global LRU queue.
When a client disconnects, its LRU queue is merged into the head of the global LRU queue.

cc @pcmoritz

cpp/src/plasma/quota_aware_policy.h

cpp/src/plasma/eviction_policy.h

pcmoritz · 2019-07-16T06:23:56Z

cpp/src/plasma/quota_aware_policy.cc

This is nice!

cpp/src/plasma/eviction_policy.cc

cpp/src/plasma/quota_aware_policy.cc

pcmoritz

Looks great! Some small comments above. Also I'm correct to assume that if no quotas are set, the behavior is unchanged from before, right?

ericl · 2019-07-16T23:16:41Z

That's right, if no clients set a quota, the behaviour should be identical from before.

codecov-io · 2019-07-18T22:23:40Z

Codecov Report

Merging #4885 into master will increase coverage by 1.63%.
The diff coverage is 95.03%.

@@            Coverage Diff             @@
##           master    #4885      +/-   ##
==========================================
+ Coverage   87.45%   89.09%   +1.63%     
==========================================
  Files         994      719     -275     
  Lines      140389   100613   -39776     
  Branches     1418        0    -1418     
==========================================
- Hits       122778    89641   -33137     
+ Misses      17249    10972    -6277     
+ Partials      362        0     -362

Impacted Files	Coverage Δ
cpp/src/plasma/store.h	`100% <ø> (ø)`	⬆️
cpp/src/plasma/client.h	`100% <ø> (ø)`	⬆️
cpp/src/plasma/protocol.h	`100% <ø> (ø)`	⬆️
python/pyarrow/_plasma.pyx	`89.28% <0%> (-2.82%)`	⬇️
cpp/src/plasma/store.cc	`91.33% <100%> (+0.19%)`	⬆️
cpp/src/plasma/protocol.cc	`94.29% <100%> (+0.52%)`	⬆️
cpp/src/plasma/quota_aware_policy.cc	`100% <100%> (ø)`
cpp/src/plasma/quota_aware_policy.h	`100% <100%> (ø)`
cpp/src/plasma/eviction_policy.h	`100% <100%> (ø)`	⬆️
cpp/src/plasma/plasma.h	`100% <100%> (ø)`	⬆️
... and 286 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2746a23...f3638b8. Read the comment docs.